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Abstract. The Hybrid Monte Carlo (HMC) algorithm currently is the favorite scheme 
to simulate quantum chromodynamics including dynamical fermions. In this talk — 
which is intended for a non-expert audience — I want to bring together methodical and 
practical aspects of the HMC for full QCD simulations. I will comment on its merits 
and shortcomings, touch recent improvements and try to forecast its efficiency and role 
in future full QCD simulations. 

1 Introduction 

The Hybrid Monte Carlo algorithm (Duane et al. 1987) is — for the present — 
a culmination in the development of practical simulation algorithms for full 
quantum chromodynamics (QCD) on the lattice. QCD is the theory of the strong 
interaction. In principle, QCD can describe the binding of quarks by gluons, 
forming the hadrons with their masses, as well as other hadronic properties. 
As QCD cannot be evaluated satisfactorily by perturbative methods, one has 
to recourse to non-perturbative stochastic simulations of the quark and gluon 
fields on a discrete 4-dimensional space-time lattice (Crcutz 1983). In analogy to 
simulations in statistical mechanics, in a Markov chain, a canonical ensemble of 
field configurations is generated by suitable Monte Carlo algorithms. As far as 
full QCD lattice simulations are concerned, the HMC algorithm is the method 
of choice as it comprises several important advantages: 

— The evolution of the gluon fields through phase space is carried out simulta- 
neously for all d.o.f., as in a molecular dynamics scheme, using the leap-frog 
algorithm or higher order symplectic integrators. 

— Dynamical fermion loops, represented in the path-integral in form of a de- 
terminant of a huge matrix of dimension O(IO^) elements, i. e. a highly 
non-local object that is not directly computable, can be included by means 
of a stochastic representation of the fermionic determinant. This approach 
amounts to the solution of a huge system of linear equations of rank O(IO^) 
that can be solved efficiently with modern iteration algorithms, so-called 
Krylov-subspace methods (Frommcr et al. 1994), (Fischer et al. 1996). 

— As a consequence, the computational complexity of HMC is a number 0{V), 
i.e., one complete sweep (update of all V d.o.f.) requires 0{V) operations, 
as it is the case for Monte Carlo simulation algorithms of local problems. 

— HMC is exact, i. e. systematic errors arising from finite time steps in the 
molecular dynamics are eliminated by a global Monte Carlo decision. 
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— HMC is ergodic due to Langevin-like stochastic elements in the field update. 

— HMC shows surprisingly short autocorrelation times, as recently demon- 
strated (SESAM collaboration 1997). The autocorrelation determines the 
statistical significance of physical results computed from the generated en- 
semble of configurations. 

— HMC can be fully parallelized, a property that is essential for efficient simu- 
lations on high speed parallel systems. 

— HMC is computation dominated, in contrast to memory intensive alternative 

methods (Luescher 1994), (Slavnov 1996). Future high performance SIMD 
(single addressing multiple data) systems presumably are memory bounded. 

In view of these properties, it is no surprise that all large scale lattice QCD sim- 
ulations including dynamical Wilson fermions as of today are based on the HMC 
algorithm. Nevertheless, dynamical fermion simulations are still in their infancy. 
The computational demands of full QCD are huge and increase extremely if one 
approaches the chiral limit of small quark mass, i. e. the physically relevant mass 
regime of the light u and d quarks. The central point is the solution of the linear 
system of equations by iterative methods. The iterative solver, however, becomes 
increasingly inefficient for small quark mass. We hope that these demands can 
be satisfied by parallel systems of the upcoming tera-computer class (Schilling 
1997). 

The HMC algorithm is a general globalMonte Carlo procedure that can evolve 
all d.o.f. of the system at the same instance in time. Therefore it is so useful 
for QCD where due to the inverse of the local fermion matrix in the stochastic 
representation of the fermionic determinant the gauge fields must be updated 
all at once to achieve 0{V) complexity. The trick is to stay close to the surface 
of constant Hamiltonian in phase space, in order to achieve a large acceptance 
rate in the global Monte Carlo step. 

HMC can be applied in a variety of other fields. A promising novel idea 
is the merging of HMC with the multi-canonical algorithm (Berg & Neuhaus 
1992) which is only parallelizable within global update schemes. The parallel 
multi-canonical procedure, can be applied at the (first-order) phase transitions 
of compact QED and Higgs- Yukawa model. Another example is the Fourier accel- 
erated simulation of polymer chains as discussed in Anders Irback's contribution 
to these proceedings, where HMC well meets the non-local features of Fourier 
acceleration leading to a multi-scale update process. 

The outline of this talk is as follows: In section 2, a minimal set of elements 
and notions from QCD, necessary for the following, is introduced. In section 3, 
the algorithmic ingredients and computational steps of HMC are described. In 
section 4, I try to evaluate the computational complexity of HMC and suggest 
a scaling rule of the required CPU-time for vanishing Wilson quark mass. Using 
this rule, I try to give a prognosis as to the role of HMC in future full QCD 
simulations in relation to alternative update schemes. 
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2 Elements of Lattice QCD 

I intend to give a pedagogical introduction into the HMC evaluation of QCD in 
analogy to Monte Carlo simulations of statistical systems. Therefore, I avoid to 
focus on details. I directly introduce the physical elements on the discrete lattice 
that are of importance for the HMC simulation. For the following, we do not 
need to discuss their parentage and relation to continuum physics in detail. 

QCD is a constituent element of the standard model of elementary parti- 
cle physics. Six quarks, the flavors up, down, strange, charm, bottom, and top 
interact via gluons. In 4-dimensional space-time, the fields associated with the 
quarks, tp^ (x) have four Dirac components, a= 1, . . . , 4, and three color compo- 
nents, a = 1, . . . , 3. The 'color' degree of freedom is the characteristic property 
reflecting the non-abelian structure of QCD as a gauge theory. This structure is 
based on local SU(3) gauge group transformations acting on the color index. 

The gluon field ^^(x) consists of four Lorentz- vector components, /z = 1, . . . , 4. 
Each component carries an index a running from 1 to 8. It refers to the compo- 
nents of the eight gluon field in the basis of the eight generators Xa of the group 
SU(3). The eight 3x3 matrices Aq/2 are traceless and hermitean defining the 
algebra of SU(3) by , ^] = « f,jk ^-^ 

On the lattice, the quark fields ^„ are considered as approximations to the 
continuum fields V'(x), with x = an, n e N'* (All lattice quantities are taken 
dimensionless in the following.). As shown in Fig. 1, they 'live' on the sites. Their 
fermionic nature is expressed by anti-commutators. 



(1) 



characterizing the quark fields as Grassmann variables. The gluon fields in the 
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Fig. 1. 2-dimensional projection of the 4-dimensional euclidean space-time lattice. 



4-dimensional discretized world are represented as bi-local objects, the so-called 

^ For the explicit structure constants fijk and the generators Ai/2 see Cheng & Li, 
1989. 
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links U^{n). They are the bonds between site n and site n+e^, with being the 
unit vector in direction fj,. Unlike the continuum gluon field, the gluon in discrete 
space is e SU(3). t/^(n) is a discrete approximation to the parallel transporter 
known from continuum QCD, U{x,y) = exp (i^g rfa;"' A^(x')Aa/2), with 
being the strong coupling constant. 

QCD is defined via the action S = Sg + Sf that consists of the pure gluonic 
part and the fermionic action. The latter accounts for the quark gluon interac- 
tion and the fcrmion mass term. Taking the link elements from above one can 
construct a simple quantity, the plaquctte P^j/, see Fig. 1: 

P^,(n) = C/^(n)[/,(n + e^)[/t(n + e,)C/t(n). (2) 

The Wilson gauge action is defined by means of the plaquette: 

'3^9 = E [l - jTr(P^.(n) + Plin))] . (3) 

In the limit of vanishing lattice spacing, one can recover the continuum version 
of the gauge action, — / (i''xip)j,y(x)F^'^(x). The deviation from the continuum 
action due to the finite lattice spacing a is of O(a^). 

The discrete version of the fermionic action cannot be constructed by a simple 
differencing scheme, as it would correspond to 16 fermions instead of 1 fcrmion 
in the continuum limit. One method to get rid of the doublers is the addition 
of a second order derivative term, (■i/'n+ej, — 2'!/'„ — '0x-e,, )/2, to the standard 
first order derivative 7^9^j'(/;(x) 7^ ('(/;n-|-e,j — '0n-ej,)/2. This scheme is called 
Wilson fermion discretization. The fermionic action can be written as a bilinear 
form, Sj = V'nAfnyV'm, with the Wilson matrix M, 



M„m = ^nm - - 'y^l)U^,{^a■) 5„,m-e^(l + 'y^l)Ul{n - e^)5„,m+e^. (4) 

The stochastic simulation of QCD starts from the analogy of the pathinteg- 
ral — the quantization prescription — to a partition sum as known from statistical 
mechanics. As it is oscillating, it would be useless for stochastic evaluation. The 
appropriate framework for stochastic simulation of QCD is that of Euclidean field 
theory. Therefore, one performs a rotation of the time direction t — > ir. The ensu- 
ing effect is a transformation of the Minkowski metrics into a Euclidean metrics, 
while a positive definite Boltzmann weight exp{—(3Sg) is achieved. This form 
of the path-integral, i. e. the partition function, is well known from statistical 
mechanics: 

n['it^^(x)][#„][#„] ) e-"'^-^'. (5) 

It is important for the following that one can integrate out the bilinear Sf over 
the Grassmann fermion fields. As a result, we acquire the determinant of the 
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fermionic matrix: 



Z= fll[dU^in)]det{M[U])e-'^^^. (6) 



3 Hybrid Monte Carlo 

The Euclidean path-integral, Eq. (6), can in principle be evaluated by Monte 
Carlo techniques. We see that the fermionic fields do not appear in Z after the 
integration^. Hence, it suffices to generate a representative ensemble of fields 
{Ui}, i = 1, . . . ,N, and subsequently, to compute any observable along with the 
statistical error according to 

{0) = j^J20m and 4 = f^ 1^|0,[C/,]|2-(0)M. (7) 

i=l \ i=l / 

The integrated autocorrelation time tj^^ reflects the fact that the members of the 
ensemble are generated by importance sampling in a Markov chain. Therefore, a 
given configuration is correlated with its predecessors, and the actual statistical 
error of a result is increased compared to the naive standard deviation. The 
length of the autocorrelation time is a crucial quantity for the efficiency of a 
simulation algorithm. 



3.1 0{V) Algorithms for full QCD 

If we want to generate a series of field configurations Ui, U2, C/3. . . in a Markov 
process, besides the requirement for ergodicity, it is sufficient to fulfill the con- 
dition of detailed balance to yield configurations according to a canonical prob- 
ability distribution: 

e-^P{U U') = e-^'P{U' U). (8) 

P{U U') is the probability to arrive at configuration U' starting out from 

U . Let us for the moment forget about dct(Af[C/]), i.e., we set dct(Af) equal 
to 1 in Eq. (6). In that case, the action is purely gluonic (pure gauge the- 
ory), and local. Therefore, using the rules of Metropolis et al. we can update 
each link independently one by one by some (reversible!) stochastic modification 
Uij,(p) — >■ i7'^(n), while only local changes in the action are induced. One 'sweep' 
is performed if all links are updated once. By application of the Metropolis rule, 
P{U U') = min [1, exp(— Z\5g)], detailed balance is fulfilled, and we are guar- 
anteed to reach the canonical distribution. Starting from a random configuration, 
after some thermalization steps, we can assume hat the generated configurations 
belong to an equilibrium distribution. Without dynamical fermions — i. e. in the 

^ Similarly, one can perform the computation of any correlation function of ip and ip, 
leading to products of the quark propagator, i. e. the inverse of M~^. 
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quenched approximation standard Metropolis shows a complexity 0{V), with 
V being the number of d.o.f. 

However, if we try to use Metropolis for full QCD, the decision P{U — > 
U') = min 1, exp(— det^ffltf])^ would imply the evaluation of the fermionic 
determinant for each [//^(n) separately. A direct computation of the determinant 
requires 0{V^) operations and therefore, the total computational complexity 
would be a number 0{V^). 

These implications for the simulation of full QCD with dynamical fermions 
have been recognized very early. In a series of successful steps, the computa- 
tional complexity could be brought into the range of quenched simulations^. 
The following table gives an (incomplete) picture of this struggle towards exact, 
ergodic, practicable and parallelizable 0{V) algorithms for full QCD. A key step 



Table 1. Towards exact and ergodic 0{V) algorithms. 



Method 


order 


exact 


ergodic 


year 


Metropolis 




yes 


yes 


Metropolis et al. 1953 


Pseudo Fermions 




no 


yes 


Pucito et al. 1981 


Gauss Representation 




yes 


yes 


Petcher, Weingarten 1981 


Larigovin 


V 


no 


yes 


Parisi, Wu 1981 


Microcanonical 


V 


no 


no 


Polonyi at al. 1982 


Hybrid Molecular Dynamic:!- 


V 


no 


yes 


Duane 1985 


HMC 


v 


yes 


yes 


Duanc ot al. 1987 


Local Bosonic Algorithm 


V 


no 


yes 


Liischer 1994 


Exact LBA 


V 


yes 


yes 


DeForcrand et al. 1995 


5-D Bosonic Algorithm 


V 


no 


yes 


Slavnov 1996 



was the introduction of the fermionic determinant by a Gaussian integral. As a 
synthesis of several ingredients, HMC is a mix of Langevin simulation, micro- 
canonical molecular dynamics, stochastic Gauss representation of the fermionic 
determinant, and Metropolis. 



3.2 Hybrid Monte Ccirlo: Quenched Ccise 

For simplicity, I first discuss the quenched approximation, i. e. det(M) = const. 
Each sweep of the HMC is composed of two steps: 

1. The gauge field is evolved through phase space by moans of (micro-canonical) 
molecular dynamics. To this end, an artificial guidance Hamiltonian H is 
introduced adding the quadratic action of momenta to 5'g, "conjugate" to 
the gauge links. The micro-canonical evolution proceeds in the artificial time 

^ Take this cum grano salts. Two 0{V) algorithms can extremely differ in the coeffi- 
cient of V. 
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direction as induced by the Hamiltonian. Choosing random momenta at the 
begin of the trajectory, ergodicity is guaranteed, as it is by the stochastic 
force in the Langevin algorithm. In contrast to Langevin, HMC carries out 
many integration steps between the refreshment of the momenta. 
2. The equations of motion are chosen to conserve Ti. In practice, a numer- 
ical integration can conserve H, only approximately. However, the change 
AH = 7if — Tii is small enough to lead to high acceptances of the Metropo- 
lis decision — rendering HMC exact, the essential improvement of HMC com- 
pared to the preceding hybrid-molecular dynamics algorihm. 

With 

n = Sg[U] + ^ ^ Tri?2(n) and Z = j [dH][dU]e-'^ , (9) 

expectation values of observables are not altered with respect to Eq. (6), if the 

momenta are chosen from a Gaussian distribution. A suitable H is found using 
the fact that U e SU(3) under the evolution. Taylor expansion of U (r+Zir) leads 
to U{t)W{t) + U{t)W{t) = 0. This differential equation is fulfilled choosing 
the first equation of motion as 

if = iHU, (10) 

with H represented by the generators of SU(3) and thus being hermitean and 
traceless, H^{n) = X^^^i Ao/i^(n). Each component /i^ is a Gaussian distributed 
random number. As H should be a constant of motion, H = 0, we get 

?i = ^Tr i.H^{n)H^{n) - ^[U^{n)V^{n) + h.cA = 

11,(1 ^ 

n = J2T^JH^{n)\H^{n)-i(^{U^{n)V^{n)-h.c.) 1=0. (11) 



We note that [] oc 1 since {} must be traceless. Since H must stay explicitly 
traceless under the evolution it follows that 0=0- The second equation of 
motion reads: 

iH{n) = -| {U^{n)V^{n) - h.c.} . (12) 

The quantities V'^(n) corresponding to a gluonic force term are the staples, i. e. 
the incomplete plaquettes that arise in the differentiation, 

n ® u • (1^) 

X x+11 x-v J 

For exact integration, the Hamiltonian Ti would be conserved. However, numer- 
ical integration only can stay close to H = const. Therefore, one adds a global 
Metropolis step, 

Pacc = min(l,e-^«), (14) 
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to reach a canonical distribution for {U}. As a necessary condition for detailed 
balance the integration scheme must lead to a time reversible trajectory and fulfill 
Liouville's theorem, i. e. preserve the phase-space volume. Symplectic integration 
is the method of choice. It is stable as far as energy drifts are concerned. 



3.3 Including Dynamical (Wilson) Fermions 

Dynamical fermions are included in form of a stochastic Gaussian representation 
of the fermionic determinant in Eq. (6). In order to ensure convergence of the 
Gauss integral, the interaction matrix must be hermitean. Since the Wilson 
fermion matrix M is a complex matrix, it cannot be represented directly. A 
popular remedy is to consider the two light quarks u and d as mass degenerate. 
With the identity det^(M) = det(M^M) the representation reads 

det(MtM) = J ||n[#„][#„]j e-^"(^'^)",»^'". (15) 

The bosonic field cp can be related to a vector R of Gaussian random numbers. 

In a heat-bath scheme, it is generated using the standard Muller-Box proce- 
dure, and with (p = M'^R, we arrive at R^R, the desired starting distribution, 
equivalent to ^*{M^M)~^(j) = cj)* X. Adding the fermionic action to W, its time 
derivative reads: 

^ = K5^Tr[J7^(n)F^(n) + /i.c.], 

n,/i 

F,{n) = [MX]„+e,Xt(l+7^) + X„+e„[MX]1,(l-7^). (16) 
F is the fermionic force that modifies the second equation of motion to 

iH{u) = {[/^(n)y^(n) + KTrF^(n) - h.c.} . (17) 



3.4 Numerical Integration and Improvements 

The finite time-step integration of the equation of motion must be reversible 
and has to conserve the phase-space volume, while it should deviate little from 
the surface Ti. = const. The leap-frog scheme can fulfill these requirements. It 
consists of a sequence of triades of the following form: 

^f/.(n,r + ^) = //^(n,T) + ^£r^(n,T) 
?7^(n, T + Zir) = e^^"^''("'^+^)C/^(n, r) 

H^{n,T + AT)=H,{n,T+^) + ^H^{n,T + AT). (18) 

It can be shown that the leap frog scheme approximates Ti. correctly up to 
0{At^) for each triade. As a rule of thumb, the time step and the number of 
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integration steps, A^md, should be chosen such that the length of a trajectory 
in fictitious time is Nmd x — 0{1) at an acceptance rate > 70%. It is 
easy to see from the discrete equations of motion (EOAI) that the phase space 
volume [(iiJ][(iC/] is conserved: loosely speaking, dU is conserved as the first EOM 
amounts to a rotation in group space, and from the second EOM follows that 
dH' = dH. In order to improve the accuracy of the numerical integration, one 
can employ higher order symplcctic integrators'*. As the integration part of HMC 
is not specific for QCD, higher order integrators could be very useful for other 
applications as the Fourier accelerated HMC introduced by A. Irback. 

Despite of the reduction of the computational complexity to 0{V), the re- 
peated determination of the large "vector" X, X = [M'^ M)~^(j), renders the 
simulation of QCD with dynamical fermions still computationally extremely in- 
tensive. The size of the vector X is about 1-20 xlO^ words. The code stays 
more than 95 % of execution time in this phase. Since typical simulations run 
several months in dedicated mode on fast parallel machines, any percent of im- 
provement is welcome. Traditionally, the system was solved by use of Krylov sub- 
space methods such as conjugate gradient, minimal residuum or Gauss-Seidel. 
In the last three years, improvements could be achieved by introduction of the 
BiCGstab solver (Frommer et al. 1994) and by use of novel parallel precondition- 
ing techniques (Fischer et al. 1996) called local-lexicographic SSOR (symmetric 
successive over-relaxation). Further improvements have been achieved through 
refined educated guessing, where the solution X of previous steps in molecular 
dynamics time is fed in to accelerate the current iteration (Brower et al. 1997). 
Altogether, a factor of about 4 up to 8 could be gained by algorithmic research. 

4 Efficiency and Scaling 

Apart from purely algorithmic issues, the efficiency of a Monte Carlo simulation 
is largely determined by the autocorrelation of the Markov chain. A significant 
determination of autocorrelation times of HMC in realistic full QCD with Wilson 
fermions could not be carried out until recently (SESAM collaboration 1997). 
The length of the trajectory samples in these simulations was around 5000 (Here, 
with 'trajectory' we denote a new field configuration at the end of a Monte Carlo 
decision.). The lattice sizes were 16"^ x 32 and 24^ x 40. 

The finite time-series approximation to the true autocorrelation function for 
an observable Ot, t = 1, . . . , tMC, is defined as 




(19) 



tuc — t 



This strategy has been used so far only for fine-resolved integration of the gauge 
fields, and coarse resolved integration of the fermions (sparing inversions). For small 
quark masses, this approach cam fail, however. 
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The definition of corelation in an artificial time is made in analogy to connected 
correlation functions in real time. The integrated autocorrelation time is defined 

as Tj^j = I + X)t'*='i^°^ ^a(o) ■ In equilibrium, characterizes the statistical 
error of the observable O. 

The integrated autocorrelation times have been determined from several ob- 
servables, such as the plaquette and the smallest eigenvalue of M. They are 
smaller than anticipated previously, and their length is between 10 and 40 tra- 
jectories. Therefore, one can consider configurations as decorrelated that are 
separated by r^f. trajectories. 

The quality of the data allowed to address the issue of critical slowing down 
for HMC, approaching the chiral limit of vanishing u and d quark mass, where 
the pion correlation length = l/m,ra is growing. The autocorrelation time 
is expected to scale with a power of ^n-, t = e^^, z is called dynamical critical 
exponent. As a result the dynamical critical exponent of HMC is located between 
z = 1.3 and 1.8 for local and extended observables, respectively. 

Finally let me try to give a conservative guess of the computational effort 
required with HMC for de-correlation. The pion correlation length must be 
limited to V^/^t^ ~ 4 to avoid finite size effects as the pion begins to feel 
the periodic boundary of the lattice. With fixed, the volume factor goes as 

Furthermore the compute effort for BiCGstab increases oc ^"^'^ (SESAM 
collaboration 1997). In order to keep the acceptance rate constant, the time 
step has been reduced (from 0.01 to 0.004) with increasing lattice size (16^ x 32 
to 24^ X 40), while the number of time steps was increased from 100 to 125. 
Surprisingly, the autocorrelation time of the 'worst case' observable, the minimal 
eigenvalue of M, goes down by 30 % compensating the increase in acceptance 
rate cost! In a conservative estimate, the total time scales as to m^* '"'. As 
a result, for Wilson fermions, the magic limit of — < 0.5, will be in reach on 

' ' ° nip ' 

32^ X Lt lattices — on a Teracomputer. 

Alternative schemes like the local bosonic algorithm or the 5-dimcnsional 
bosonic scheme are by far more memory consuming than HMC. Here, a promis- 
ing new idea might be the Polynomial HMC (Prezzotti & Jansen). The autocor- 
relation times of these alternative schemes in realistic simulations arc not yet 
known accurately, however. In view of the advantages of HMC mentioned in the 
introduction, and the improvements achieved, together with the our new findings 
as to its critical dynamics, I reckon HMC to be the method of choice for future 
full QCD simulations on Teralcomputers. 

Acknowledgments. I thank the members of the SESAM and the TxL collab- 
orations and A. Frommer for many useful discussions. 
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